Back

npj Breast Cancer

Springer Science and Business Media LLC

Preprints posted in the last 30 days, ranked by how well they match npj Breast Cancer's content profile, based on 18 papers previously published here. The average preprint has a 0.01% match score for this journal, so anything above that is already an above-average fit.

1
Estrogen receptor-positive cell line xenograft models recapitulate metastatic dissemination and endocrine response of invasive lobular breast carcinoma

Tasdemir, N.; Savariau, L.; Scott, J.; Latoche, J.; Biery, K.; Li, Z.; Bossart, E.; Sreekumar, S.; Brown, D.; Wang, S.; Watters, R.; Nasrazadani, A.; Qin, Y.; Cao, Y.; Chen, F.; Tseng, G.; Castro, C.; Anderson, C. J.; Atkinson, J.; Hooda, J.; Lucas, P. C.; Davidson, N.; LEE, A. V.; Oesterreich, S.

2026-03-18 cancer biology 10.64898/2026.03.17.712396 medRxiv
Top 0.1%
4.8%
Show abstract

Invasive lobular breast carcinoma (ILC), the most common special histological subtype of breast cancer, is characterized by nearly universal expression of estrogen receptor alpha (ER) and unique sites of metastases, neither of which is fully recapitulated by genetically engineered mouse models. Using reporter-labeled ILC mouse xenografts, herein we used mammary fat pad, tail vein and intracardiac orthotopic growth to analyze spontaneous and experimental metastasis and gene expression. We observed ER-positive primary tumors with single-file histology and collagen deposition, and spontaneous metastasis from the mammary fat pad to bones, ovaries, and brain including the leptomeninges, thereby closely mirroring the growth and metastatic spread of human ILC. Brain metastases showed strong ER staining, confirmed by sequencing analyses which identified estrogen signaling as top activated pathway, and the lesions exhibited robust response to endocrine therapy. In summary, we report endocrine responsive mammary fat pad, tail vein and intracardiac xenografts that faithfully demonstrate unique ILC features and can serve as invaluable pre-clinical translational platforms for validating candidate ILC genetic drivers and testing novel therapeutics.

2
MOSAIC: Explainable AI for Reproducible Histologic Grading and Prognostic Stratification in Breast Cancer

Sonpatki, P.; Gupta, S.; Biswas, A.; Patil, S.; Tyagi, S.; Balakrishnan, L.; Mistry, H.; Doshi, P.; Jagadale, K.; Shelke, P.; Parikh, L.; Shah, M.; Bharadwaj, R.; Desai, S.; Kulkarni, M.; Koppiker, C. B.; Prabhu, J.; Kachchhi, U.; Shah, N.

2026-03-18 pathology 10.64898/2026.03.11.26348043 medRxiv
Top 0.1%
4.8%
Show abstract

Nottingham histologic grading is essential for breast cancer prognostication but suffers from inter-observer variability in assessing mitotic activity, nuclear pleomorphism, and tubule formation. We developed MOSAIC (Mammary Oncology Spatial Analysis and Intelligent Classification), an explainable AI framework designed to perform component-wise grading by independently modeling these three histologic features. Model outputs were calibrated using a two-phase pathology study to establish clinically reproducible scoring thresholds and were subsequently evaluated across public datasets and multi-institutional Indian cohorts. MOSAIC demonstrated robust performance, with AI-derived grades providing independent prognostic information (HR >= 1.8 in two datasets, p = < 0.001) and improved survival stratification compared to traditional methods. In pathologist calibration studies, AI-assisted scoring significantly reduced variability, specifically achieving near-perfect agreement in mitotic scoring with a weighted {kappa} up to 0.98. Accuracy and Cohens kappa ({kappa}) analysis further characterized the models technical performance across components: Tubule formation showed the highest agreement (Accuracy >= 0.6607, {kappa} = 0.549), followed by overall Grade (Accuracy = 0.5637, {kappa} = 0.539) and Mitotic activity (Accuracy = 0.4985, {kappa} = 0.4), while Nuclear pleomorphism proved the most challenging (Accuracy = 0.3303, {kappa} = 0.271). Comparative survival models confirmed that AI-derived grades were more significant predictors of risk than manual pathologist-assigned grades, with the AI model yielding a superior global p-value (5.9 x 10-7) and lower AIC (769.61). These results indicate that MOSAIC enables reproducible, interpretable grading by decomposing assessment into pathology-aligned components. By enhancing consistency while preserving prognostic relevance, this framework supports explainable AI as a viable assistive tool for routine breast cancer pathology.

3
The tumour microenvironment influences long-term tamoxifen benefit in postmenopausal ER+/HER2- breast cancer patients.

Camargo Romera, P.; Castresana Aguirre, M.; Danielsson, O.; Dar, H.; Ostman, A.; Czene, K.; Lindstrom, L. S.; Tobin, N. P.

2026-03-26 oncology 10.64898/2026.03.24.26349151 medRxiv
Top 0.1%
3.9%
Show abstract

BackgroundThe tumour microenvironment (TME) influences breast cancer progression and treatment response. We investigated whether TME composition predicts tamoxifen benefit in postmenopausal women with oestrogen receptor-positive, HER2-negative (ER+HER2-) breast cancer. MethodsThis study included 513 patients from the Stockholm Tamoxifen (STO-3) trial, which randomised postmenopausal, lymph node-negative women to tamoxifen or no endocrine therapy. Bulk tumour transcriptomes were deconvoluted with the ConsensusTME algorithm to estimate the relative abundance of 18 immune and stromal cell types. A summary score of combined immune cells was created on a per patient basis and evaluated alongside fibroblast and endothelial stromal compartments. Patients were categorised into immune and stromal tertiles on the basis of these scores. Associations between TME composition and tumour characteristics were evaluated using Spearman correlations and Fishers exact test. Tamoxifen benefit was analysed by univariable Kaplan-Meier (log-rank) and multivariable Cox proportional hazards adjusting for age, tumour size, grade, progesterone receptor, Ki-67, and radiotherapy. Differential expression was assessed with limma and pathway enrichment with fgsea using Hallmark gene sets from MSigDB. ResultsLow immune abundance was significantly associated with higher ER expression (Fishers exact test p < 0.001). Among tamoxifen-treated patients, those with low immune scores showed improved distant recurrence-free interval (DRFI) relative to untreated patients (log-rank p < 0.001). Similarly, intermediate endothelial (p < 0.001) and low/intermediate fibroblast abundances (p = 0.042, p = 0.009) were associated with favourable DRFI. In multivariable models, low immune (aHR = 0.17, 95% CI 0.08-0.40), intermediate endothelial (aHR = 0.21, 95% CI 0.09-0.51), and low/intermediate fibroblast tertiles (aHR = 0.50, 95% CI 0.27-0.93; aHR = 0.36, 95% CI 0.17-0.77) retained significance. Transcriptomic analysis revealed enrichment of oestrogen-response, MYC-target, and oxidative-phosphorylation pathways in low-immune and low-fibroblast tumours, while interferon-{gamma} response and allograft rejection pathways were downregulated. ConclusionsTME composition modulates tamoxifen benefit in postmenopausal ER+HER2-breast cancer. Low immune, intermediate endothelial, and low/intermediate fibroblast abundances are associated with improved benefit from tamoxifen, suggesting that both immune and stromal compartments influence endocrine treatment efficacy.

4
Artificial Intelligence and Circulating microRNA Signatures for Early Breast Cancer Detection: A Systematic Review and Meta-Analysis

Solanki, s.; Solanki, N.; Prasad, J.; Prasad, R.; Harsulkar, A.

2026-03-30 oncology 10.64898/2026.03.29.26349657 medRxiv
Top 0.1%
2.4%
Show abstract

Background: Early breast cancer detection remains central to improving clinical outcomes, yet conventional screening pathways, particularly mammography, have recognized limitations in sensitivity, specificity, and performance in dense breast tissue. Circulating microRNAs (miRNAs) have emerged as promising minimally invasive biomarkers, while artificial intelligence and machine learning (AI/ML) offer powerful tools for identifying diagnostically relevant multi-marker patterns within complex biomarker datasets. This systematic review and meta-analysis evaluated the diagnostic performance of AI/ML-based circulating miRNA signatures for early breast cancer detection. Methods: A systematic search of PubMed/MEDLINE, Scopus, and Web of Science Core Collection was conducted from database inception to 31 December 2025. Studies were eligible if they were original human investigations evaluating circulating miRNAs using an AI/ML-based diagnostic model for breast cancer detection and reporting extractable diagnostic performance metrics. Study selection followed PRISMA 2020 and PRISMA-DTA guidance. Methodological quality was assessed using QUADAS 2. Pooled sensitivity and specificity were synthesized using a bivariate random-effects model, and overall diagnostic performance was summarized using a hierarchical summary receiver operating characteristic framework. Results: Seven studies met the inclusion criteria for qualitative synthesis, with eligible studies contributing to the quantitative analysis depending on data availability. Across the pooled analysis, AI/ML-based circulating miRNA models demonstrated good overall diagnostic performance, with a pooled AUC of 0.905 (95% CI: 0.890 to 0.921), pooled sensitivity of 81.3% (95% CI: 76.8% to 85.2%), and pooled specificity of 87.0% (95% CI: 82.4% to 90.7%). Heterogeneity was moderate for AUC (I2 = 42.3%) and sensitivity (I2 = 38.7%) and low for specificity (I2 = 28.4%). Risk-of-bias assessment showed overall low-to-moderate methodological concern, with patient selection representing the most variable domain. Deeks funnel plot asymmetry test showed no significant evidence of publication bias (p = 0.34). Conclusions: AI/ML based circulating miRNA signatures show promising diagnostic accuracy for early breast cancer detection and may have value as non invasive adjunctive tools within imaging supported diagnostic pathways. However, the evidence base remains limited by methodological heterogeneity, variable validation rigor, and the predominance of retrospective case control designs. Prospective, standardized, and externally validated studies are needed before routine clinical implementation can be justified.

5
Patient-derived organoid xenografts reveal the multifaceted role of the lncRNA MALAT1 in breast cancer progression

Aggarwal, D.; Russo, S.; Anderson, K.; Floyd, T.; Utama, R.; Rouse, J. A.; Naik, P.; Pawlak, S.; Iyer, S. V.; Kramer, M.; Satpathy, S.; Wilkinson, J. E.; Gao, Q.; Bhatia, S.; Arun, G.; Akerman, M.; McCombie, W. R.; Revenko, A.; Kostroff, K.; Spector, D. L.

2026-04-03 cancer biology 10.64898/2026.04.02.716096 medRxiv
Top 0.1%
2.1%
Show abstract

BackgroundLong non-coding RNAs (lncRNAs) have emerged as key regulators of tumor biology, however, thus far none have translated to cancer therapies. The lncRNA MALAT1 is overexpressed in more than 20 cancers, including breast cancer and has been shown to function via various mechanisms in a context-dependent manner, in 2D cell lines and mouse models. However, its functional role and therapeutic potential have not been evaluated in clinically relevant patient-derived models. MethodsWe investigated the therapeutic potential of a MALAT1-targeting antisense oligonucleotide (ASO) for breast cancer, using clinically relevant 3D human patient-derived organoids (PDOs) and PDO-xenograft (PDO-X) models. We systematically evaluated the efficiency of MALAT1-targeting ASOs using a biobank of 28 PDO models. Using three independent PDO-X models of triple negative breast cancer (TNBC), we targeted MALAT1 in vivo to study its impact on transcription, alternative splicing, stromal remodeling and metastasis. ResultsAcross PDO-X models, MALAT1 depletion reproducibly drove widespread alternative splicing changes across all event types, particularly intron retention events, accompanied by modest gene expression alterations. Differentially spliced transcripts were enriched for targets of shared cancer-associated transcription factors, and MALAT1 knockdown altered the relative abundance of previously unannotated splicing isoforms. Beyond tumor-intrinsic effects, tumor-specific MALAT1 depletion induced a consistent reduction in macrophage-associated gene signatures and reduced lung metastatic burden. ConclusionsOur data define MALAT1s multifaceted role in TNBC, coordinating alternative splicing, transcriptional fine-tuning, tumor-stroma crosstalk, and metastatic progression. Our study provides strong preclinical evidence supporting MALAT1-targeted ASO therapy and establishes PDO-X models as a clinically relevant platform for functional interrogation of TNBC therapies.

6
Detection of Candidate Circular RNAs to Monitor Anti-Hormonal Response in the Mammary Gland

Trummer, N.; Weyrich, M.; Ryan, P.; Furth, P. A.; Hoffmann, M.; List, M.

2026-03-30 cancer biology 10.64898/2026.03.26.714379 medRxiv
Top 0.1%
1.6%
Show abstract

Anti-hormonal therapies such as selective estrogen receptor modulators like tamoxifen or aromatase inhibitors like letrozole represent a cornerstone for breast cancer prevention and therapy of estrogen receptor-positive breast cancer. Therapeutic monitoring can include blood tests and imaging; however, genetically-based approaches are not yet in practice. Ideally, a test would be able to detect a positive molecular response across different estrogen pathway-suppressive approaches. Circular RNAs are a species of non-coding RNAs detectable in plasma that have been proposed as non-invasive therapeutic biomarkers. To determine whether a set of specific circular RNAs is altered across estrogen-suppressive pathway approaches, we analyzed mammary gland-specific total RNA sequencing data from two individual genetically engineered mouse models (GEMMs) of estrogen pathway-induced breast cancer, with or without exposure to tamoxifen or letrozole. The nf-core/circrna pipeline was used to identify circRNAs that were differentially expressed in response to either tamoxifen or letrozole. We then screened for circRNAs that were differentially regulated by both anti-hormonals. Four up-regulated and 31 down-regulated circRNAs with host genes known to be expressed in human breast epithelial cells were identified as showing reproducible differential regulation in response to anti-hormonal treatment.

7
Fully Automated Abstraction of Longitudinal Breast Oncology Records with Off-The-Shelf Large Language Models

Dickerson, J. C.; McClure, M. B.; Shaw, M.; Reitsma, M. B.; Dalal, N. H.; Kurian, A. W.; Caswell-Jin, J. L.

2026-03-25 oncology 10.64898/2026.03.23.26349012 medRxiv
Top 0.1%
1.4%
Show abstract

Background: Manual chart abstraction is a major bottleneck in clinical research. In oncology, important outcomes such as disease recurrence and the treatment history are often only documented in clinical notes, limiting the scale and quality of observational and epidemiologic studies. We developed an open-source pipeline that, in a HIPAA-compliant setting, can use any commercially available large language model (LLM) to determine whether variables from complex longitudinal oncology records can be abstracted with performance similar to that of expert medical oncologists. Methods: We randomly selected 100 patients from an institutional breast cancer cohort enriched for complex care. We abstracted a range of key variables from unstructured data, including dates of diagnosis and recurrence, clinical stage, biomarker subtype, genetic testing results, and prescribed systemic therapies, including treatment timing, intent, and reason for discontinuation. The inputs to the LLM were unnormalized, unlabeled, and unedited clinical notes, pathology reports, med admin records, and demographics. Breast oncologists abstracted the same variables to create the reference standard. For systemic therapy extraction, a second oncologist and research coordinators served as comparators. In addition to variable-level performance, we examined whether survival and hazard-ratio estimates were similar for fully LLM-derived datasets compared with expert-derived datasets. Results: Among 100 patients, the median chart had more than 3,100 pages of text; patients received a median of 7 lines of therapy over 6.5 years of follow-up. The best-performing LLM achieved 99% concordance with the expert for recurrence status, 100% for germline BRCA1/2 pathogenic variant detection, 99% for hormone receptor status, 96% for HER2 status, 91% for clinical stage, 91% for PIK3CA mutation status, and 90% for ESR1 mutation status. For anti-cancer drug extraction, the best-performing LLM approached inter-oncologist variability. For exact therapy-line reconstruction, mean patient-level performance remained 9 percentage points lower than the second oncologist, although inter-LLM disagreement was similar to inter-oncologist disagreement. All four LLMs tested outperformed the research coordinators on systemic therapy abstraction. Recurrence-free survival, overall survival, and hazard ratio estimates were similar between expert-derived and LLM-derived datasets. In an external cohort of 97 young patients with early-stage breast cancer, the unmodified pipeline showed similar performance for recurrence detection and adjuvant endocrine therapy use. Conclusions: Off-the-shelf LLMs in a fixed retrieval pipeline were able to abstract a range of variables from complex longitudinal oncology records with performance approaching inter-oncologist variability for key tasks, without any fine-tuning or institution-specific retraining. This approach offers a practical path to scaling the creation of research-grade retrospective datasets from narrative medical records.

8
Predicting 5-Year Breast Cancer Risk from Longitudinal Digital Breast Tomosynthesis: A Single-center Retrospective Study

Xu, Y.; Heacock, L.; Park, J.; Pasadyn, F. L.; Lei, Q.; Lewin, A.; Geras, K. J.; Moy, L.; Schnabel, F.; Shen, Y.

2026-03-24 radiology and imaging 10.64898/2026.03.22.26349001 medRxiv
Top 0.1%
1.3%
Show abstract

Background: Imaging-based breast cancer risk prediction models primarily use full-field digital mammography (FFDM). As digital breast tomosynthesis (DBT) has become a predominant screening modality in the United States, its potential for long-term breast cancer risk prediction remains under-explored. Objective: To develop and evaluate a deep learning model that uses longitudinal DBT exams to predict long-term breast cancer risk. Methods: This retrospective study included 313,531 DBT exams from 161,165 women (mean age, 58.5, std 11.7 years) between January 2016 and August 2020 at Institute A. A risk prediction (DRP) model was developed to estimate 2-5 year breast cancer risk using longitudinal DBT exams, patient age and breast density. Model performance was compared with a single-time point DBT model, the Mirai model using same-day FFDM, and the Tyrer-Cuzick model using the area under the receiver operating characteristic curve (AUC), time-dependent concordance index, and integrated Brier score. Results: In an independent test set (n = 34,580), the longitudinal DRP model achieved a 5-year AUC of 0.720 (95% CI, 0.703-0.738), improving on the single time point DRP model (AUC, 0.706; 95% CI, 0.687-0.724; p < 0.001) and the Mirai model (AUC, 0.687; 95% CI, 0.668-0.705; p < 0.001). In a matched case-control cohort (n=432), the DRP model achieved a 5-year AUC of 0.676 (95% CI, 0.626-0.727), compared with 0.567 (95% CI, 0.514-0.621; p < 0.001) for the Tyrer-Cuzick model. The model reclassified 37.6% (705/1,877) of women with extremely dense breasts as average risk, with a 5-year cancer incidence of 0.7% (5/705), and identified 15.5% (404/2,605) of women with fatty breasts as high risk, with a 5-year cancer incidence of 2.5% (10/404). Conclusion: A deep learning model using longitudinal DBT examinations improved long-term breast cancer risk prediction compared with FFDM-based and clinical risk models. Clinical Impacts: Longitudinal DBT-based risk prediction may enable dynamic risk assessment using screening images, supporting personalized screening strategies and more targeted use of supplemental imaging.

9
Virtual Spectral Decomposition with Dendritic Tile Selection: An Explainable AI Framework for Multimodal Tissue Composition Analysis and Immune Phenotyping Across Pancreatic, Lung, and Breast Cancer

Chandra, S.

2026-04-13 oncology 10.64898/2026.04.11.26350689 medRxiv
Top 0.1%
1.3%
Show abstract

Background: Current deep learning models in computational pathology, radiology, and digital pathology produce opaque predictions that lack the explainable artificial intelligence (xAI) capabilities required for clinical adoption. Despite achieving radiologist-level performance in tasks from whole-slide image (WSI) classification to mammographic screening, these models function as black boxes: clinicians cannot trace predictions to specific biological features, verify outputs against established morphological criteria, or integrate AI reasoning into precision oncology workflows and tumor board decision-making. Methods: We present Virtual Spectral Decomposition (VSD), a modality-agnostic, interpretable-by-design framework that decomposes medical images into six biologically interpretable tissue composition channels using sigmoid threshold functions - the same mathematical structure as CT windowing. Unlike post-hoc xAI methods (Grad-CAM, SHAP, LIME) applied to black-box deep learning models, VSD channels have pre-defined biological meanings derived from tissue physics, providing inherent explainability without sacrificing quantitative rigor. For whole-slide image (WSI) analysis in digital pathology, we introduce the dendritic tile selection algorithm, a biologically-inspired hierarchical architecture achieving 70-80% computational reduction while preferentially sampling the tumor immune microenvironment. VSD is validated across three cancer types and imaging modalities: pancreatic ductal adenocarcinoma (PDAC) on CT imaging, lung adenocarcinoma (LUAD) on H&E-stained pathology slides using TCGA data, and breast cancer on screening mammography. Composition entropy of the six-channel vector is computed as a visual Biological Entropy Index (vBEI) - an imaging biomarker quantifying the diversity of active biological defense systems. Results: In pancreatic cancer, the fat-to-stroma ratio (a novel CT-derived radiomics biomarker) declines from >5.0 (normal) to <0.5 (advanced PDAC), enabling early detection of desmoplastic invasion before mass formation on standard imaging. In lung cancer, composition entropy from H&E whole-slide images correlates with tumor immune microenvironment markers from RNA-seq (CD3: rho=+0.57, p=0.009; CD8: rho=+0.54, p=0.015; PD-1: rho=+0.54, p=0.013) and predicts overall survival (low entropy immune-desert phenotype: 71% mortality vs 29%, p=0.032; n=20 TCGA-LUAD), providing immune phenotyping for checkpoint immunotherapy patient selection from a $5 H&E slide without molecular assays. In breast cancer, each lesion type produces a characteristic six-channel fingerprint functioning as an interpretable computer-aided diagnosis (CAD) system for quantitative BI-RADS assessment and subtype classification (IDC vs ILC vs DCIS vs IBC). A five-level xAI audit trail provides complete traceability from clinical decision support output to specific biological structures visible on the original images. Conclusion: VSD establishes a unified, interpretable-by-design mathematical framework for explainable tissue composition analysis across imaging modalities and cancer types. Unlike black-box deep learning and post-hoc xAI approaches, VSD provides inherently interpretable, clinically verifiable cancer detection and immune phenotyping from standard clinical imaging at existing costs - without requiring foundation model infrastructure, specialized hardware, or molecular assays. The open-source pipeline (Google Colab, Supplementary Material) enables immediate reproducibility and extension to additional cancer types across the pan-cancer TCGA atlas.

10
Mutant p53 Directs PARP to Regulate Replication Stress and Drive Breast Cancer Metastasis

Xiao, G.; Annor, G. K.; Harmon, K. W.; Chavez, V.; Levine, F.; Ahuno, S.; St. Jean, S. C.; Madorsky Rowdo, F. P.; Leybengrub, P.; Gaglio, A.; Ellison, V.; Venkatesh, D.; Sun, S.; Merghoub, T.; Greenbaum, B.; Elemento, O.; Davis, M. B.; Ogunwobi, O.; Bargonetti, J.

2026-03-28 cancer biology 10.64898/2026.03.26.713220 medRxiv
Top 0.1%
1.3%
Show abstract

TP53 mutations occur in 80-90% of triple-negative breast cancers (TNBCs) and drive genomic instability and metastatic progression. Poly (ADP-ribose) polymerase (PARP) is critical for DNA repair and replication fork stability. How oncogenic signaling influences PARP function to sustain proliferation during replication stress remains unclear. Mutant p53 (mtp53) R273H associates tightly with chromatin, forms complexes with PARP, and enhances PARP recruitment to replication forks [1-3]. The C-terminal region of mtp53 mediates mtp53-PARP and mtp53-Poly (ADP-ribose) (PAR) interactions that facilitate S phase progression [4, 5]. The PARP inhibitor talazoparib (TAL) combined with the alkylating agent temozolomide (TMZ) produces synergistic cytotoxicity selectively in mtp53, but not wild-type p53 (wtp53), breast cancer cells and organoids. Herein we evaluated the mechanism of mtp53-associated cell death and tested if this could translate to a preclinical xenograft model. We found that TMZ+TAL treatment induced elevated cleaved PARP and {gamma}H2AX and reduced the metastasis-promoting oncoprotein MDMX. In orthotopic xenografts expressing mtp53 R273H, but not wtp53, combination therapy significantly decreased circulating tumor cells (CTCs) and lung metastases. Transcriptomic profiling of tumors from combination treated animals demonstrated downregulation of MDMX, VEGF, and NF-{kappa}B, consistent with the observed suppression of CTCs and lung metastasis, and increased {gamma}H2AX, indicative of replication stress in mtp53 xenografts. Inhibition of metastasis was also observed in mtp53 R273H WHIM25 and p53-undetectable WHIM6 TNBC patient-derived xenografts (PDX). The mtp53 C-terminal domain (347-393) demonstrated a critical tumor promoting function, as CRISPR-mediated deletion impaired replication fork progression, tumor growth, and metastatic dissemination. DNA fiber combing showed that expression of full-length mtp53 R273H, but not C-terminal deleted {Delta}347-393, supported sustained single-stranded DNA gaps (ssGAPs) following Poly (ADP-ribose) glycohydrolase (PARG) inhibition. These findings support that mtp53 uses C-terminal amino acids to exploit PARP to enable replication stress adaptation and that mtp53 is a predictive biomarker for combined PARP inhibitor and DNA damaging therapies targeting TNBC. Significance statementTP53 mutations are the most common genetic alterations in TNBC and a major driver of replication stress and metastasis. This study shows that missense mutant p53 uses C-terminal amino acids to reprogram PARP activity to maintain tumor cell survival under replication stress. We demonstrate that p53 status governs the response to combined PARP inhibitor (PARPi) and DNA-damaging chemotherapy, establishing an additional molecular basis beyond BRCA1 mutations for treating TNBC with PARPi therapy. These findings reveal a previously unrecognized mechanism by which the mutant p53-PARP axis enables replication stress tolerance and drives cancer metastasis. We show mutation of p53 in TNBC provides an additional biomarker-guided framework to improve PARPi therapeutic outcomes.

11
Leveraging Large Language Models to Extract Prognostic Pathology Features in Ewing Sarcoma

Huang, J.; Batool, A.; Gu, Z.; Zhao, Z.; Yao, B.; Black, J.; Davis, J.; al-Ibraheemi, A.; DuBois, S.; Barkauskas, D.; Ramakrishnan, S.; Hall, D.; Grohar, P.; Xie, Y.; Xiao, G.; Leavey, P. J.

2026-03-19 bioinformatics 10.64898/2026.02.20.707103 medRxiv
Top 0.1%
1.0%
Show abstract

Background: Current risk stratification for Ewing sarcoma relies heavily on clinical factors such as metastatic status, failing to capture histologic heterogeneity as a potential prognostic indicator. Although pathology reports contain rich biological data, this information remains locked in unstructured narrative text, limiting large-scale retrospective analyses. We aimed to validate the utility of Large Language Models (LLMs) for scalable data abstraction and to identify prognostic histologic features from a large multi-institutional cohort. Methods: We conducted a retrospective cohort study using data from six Children's Oncology Group (COG) clinical trials. We utilized an LLM-based pipeline (OpenAI o3) to extract structured variables, including immunohistochemical (IHC) markers and CD99 staining patterns - from digitized, Optical Character Recognition (OCR)-processed pathology reports. Extraction accuracy was validated against a human-annotated ground truth (n=200) and cross-validated against senior experts (n=48). We assessed the association between extracted features and Overall Survival (OS) using Kaplan-Meier analysis and multivariable Cox proportional hazards regression, adjusting for metastatic status. Findings: We analyzed 931 diagnostic pathology reports spanning over 19-years. The LLM achieved a weighted average accuracy of 94% across 17 IHC markers; in a cross-validation subset, the LLM outperformed human annotators (weighted average accuracy over 15 IHC markers: LLM o3: 98.1%, a resident specialist 91.4%, and a senior expert 95.9%). Survival analysis identified Neuron-Specific Enolase (NSE) and S100 as significant prognostic biomarkers. After adjusting for metastatic status, NSE positivity was associated with significantly inferior survival (HR 2.15, 95% CI 1.15 - 4.02, p=0.016); this risk was most pronounced in patients with non-metastatic disease (HR 5.64, p=0.0055). Conversely, S100 positivity was associated with improved survival (HR 0.58, 95% CI 0.34-1.00, p=0.046). Interpretation: LLM-assisted extraction of pathology variables is highly accurate and scalable, capable of unlocking "dark data" from historical clinical trials. We identified NSE as a potent risk factor and S100 as a protective marker in Ewing sarcoma, particularly in localized disease. These findings suggest that AI-derived histologic data can refine risk stratification and, if validated, warrant inclusion in future prospective trials.

12
Quantitative assessment of collagen architecture from routine histopathological images shows concordance with Second Harmonic Generation microscopy

Ingawale, V.; Dandapat, K.; Konkada Manattayil, J.; Gupta, S.; Shashidhara, L. S.; Koppiker, C.; Shah, N.; Raghunathan, V.; Kulkarni, M.

2026-04-06 pathology 10.64898/2026.03.31.26349841 medRxiv
Top 0.1%
1.0%
Show abstract

Collagen organisation within the tumour microenvironment plays a critical role in tumour progression and has emerged as an important structural biomarker in cancer. Second Harmonic Generation (SHG) microscopy enables label-free visualisation and quantitative assessment of fibrillar collagen architecture; however, its high cost, specialised instrumentation, and limited field-of-view restrict routine clinical application. In this study, we evaluated whether collagen features quantified from digitally scanned Masson-Goldners Trichrome-stained histopathological sections can approximate measurements obtained from SHG microscopy. Formalin-fixed paraffin-embedded breast tumour tissues, including benign and invasive ductal carcinoma (IDC) samples with varying collagen content, were analysed using SHG microscopy and whole-slide brightfield imaging. Matched regions of interest were analysed using two independent digital image analysis approaches: a conventional ImageJ-based workflow (TWOMBLI) and a machine learning-based computational pipeline. Collagen structural parameters including collagen deposition area, fibre number, and alignment metrics were quantified and compared across imaging modalities using correlation analysis. SHG signals were consistently detected from trichrome-stained sections, confirming compatibility of SHG imaging. Quantitative comparison demonstrated significant concordance between SHG-derived collagen metrics and those obtained from digital image analysis pipelines, particularly for collagen area and fibre alignment. These findings demonstrate that computational analysis of routine histopathological images can capture key spatial features of collagen organisation comparable to SHG microscopy. Digital pathology-based collagen quantification therefore, represents a scalable and clinically accessible approach for assessing extracellular matrix architecture in tumour tissues.

13
Validation of Immunoscore for Prognostic Stratification in HPV-associated Oropharyngeal Cancer: An International Multicenter Study

Nguyen, D. H.; Majdi, A.; Marliot, F.; Houtart, V.; Kirilovsky, A.; Hijazi, A.; Fredriksen, T.; de Sousa Carvalho, N.; Bach, A.- S.; Gaultier, A.- L.; Fabiano, E.; Kreps, S.; Tartour, E.; Pere, H.; Veyer, D.; Blanchard, P.; Angell, H. K.; Pages, F.; Mirghani, H.; Galon, J.

2026-04-11 oncology 10.64898/2026.04.08.26350238 medRxiv
Top 0.2%
0.8%
Show abstract

BackgroundTreatment optimization in HPV-associated oropharyngeal cancer (OPSCC) remains challenging, as recent de-escalation trials have shown limited success. Current patient selection strategies based on smoking history and TNM classification are insufficient, highlighting the need for robust, standardized prognostic biomarkers. We report the first validation of the Immunoscore (IS) for prognostic stratification in HPV-associated OPSCC. Patients and methodsWe analyzed 191 HPV-associated (p16+ and HPV DNA/RNA+) OPSCC patients from an international multicenter cohort (2015-2024), comprising a French monocentric retrospective training cohort (N = 48) and three validation cohorts: French monocentric retrospective (N = 48), French multicenter prospective (N = 50), and US multicenter retrospective (N = 45). IS is a standardized digital pathology assay quantifying CD3lJ and CD8lJ densities in tumor cores and invasive margins, with cut-offs defined in the training cohort and validated across cohorts. Associations with disease-free survival (DFS), time to recurrence (TTR) and overall survival (OS) were assessed, alongside 3RNA-seq and sequential immunofluorescence profiling of immune composition. ResultsMedian age 65; 80% male; 74% smokers; 66% T1-2; 82% N0-1 (AJCC8th). IS-High patients demonstrated superior 3-year DFS in the training and validation cohorts 1-3 (all log-rank P < 0.05). Multivariable analysis identified IS-Low as the strongest independent risk factor for DFS (HR 9.03; 95% CI: 4.02-20.31; P < 0.001). The model combining IS with clinical factors showed higher predictive accuracy for DFS (C-index 0.82) than clinical variables alone (0.7; P < 0.0001). Similar findings were observed for TTR and OS. IS-High tumors showed markedly higher enrichment of lymphoid and myeloid immune cell populations, contrasting with immune-poor signatures in IS-Low tumors. ConclusionsIS is a robust biomarker that outperforms standard clinical variables in both prognostic and predictive accuracy. The enriched cytotoxic immune infiltrate in IS-High tumors explains favorable outcomes and supports their suitability for treatment de-escalation. Prospective validation is warranted.

14
Autopsy-based longitudinal multi-organ high-dimensional profiling reveals lineage plasticity in TRK-inhibitor-resistant secretory breast carcinoma

Muroyama, Y.; Yanagaki, M.; Tada, H.; Ebata, A.; Ito, T.; Ono, K.; Tominaga, J.; Miyashita, M.; Suzuki, T.

2026-04-08 pathology 10.64898/2026.04.06.716668 medRxiv
Top 0.2%
0.8%
Show abstract

Secretory breast carcinoma (SBC) is typically indolent, yet mechanisms underlying aggressiveness and therapeutic resistance to tropomyosin receptor kinase inhibitors (TRKi) remain unclear. Autopsy-based longitudinal multi-organ high-dimensional profiling of metastatic TRKi-resistant SBC demonstrated histopathological heterogeneity, including secretory and squamous components, arising from a shared clonal origin. Integrated genomic and transcriptomic analyses revealed hierarchical transcriptional rewiring consistent with a lineage-plastic state, suggesting a potential link to tumor aggressiveness and therapeutic resistance.

15
A Transformer-Based 2.5D Deep Learning Model for Preoperative Prediction of Lymph Node Metastasis in Papillary Thyroid Carcinoma

Xu, S.; Yan, X.; Su, Y.; Qi, J.; Chen, X.; Li, Y.; Xiong, H.; Jiang, J.; Wei, Z.; Chen, Z.; YALIKUN, Y.; Li, H.; Li, X.; Xi, Y.; Li, W.; Li, X.; Du, Y.

2026-04-02 oncology 10.64898/2026.04.01.26349933 medRxiv
Top 0.2%
0.8%
Show abstract

Background: Accurate preoperative prediction of lymph node metastasis (LNM) in papillary thyroid carcinoma (PTC) remains challenging, particularly in clinically node-negative (cN0) patients, leading to potential overtreatment. We aimed to develop and validate a Transformer-based 2.5D deep learning model (ThyLNT) using preoperative computed tomography (CT) images for robust prediction of LNM and to explore its underlying biological basis through multi-omics analyses. Methods: A total of 1,560 PTC patients from six hospitals were retrospectively included. The Tongji Hospital cohort (n=1,010) was divided into training (70%) and internal validation (30%) sets, while five independent institutions served as external test cohorts. For each lesion, seven 2.5D slices were extracted and modeled using a DenseNet201 backbone. Slice-level features were integrated using a Transformer-based feature-level fusion strategy and compared with ensemble learning, multi-instance learning (MIL), and traditional radiomics approaches. Model performance was assessed using area under the receiver operating characteristic curve (AUC), calibration analysis, decision curve analysis (DCA), and precision-recall curves. Multi-omics analyses, including bulk RNA-seq, single-cell RNA-seq, spatial transcriptomics, and spatial metabolomics, were performed to investigate biological correlates. Results: The Transformer-based model consistently outperformed comparator models across cohorts. In the training and validation cohorts, ThyLNT achieved AUCs of 0.882 and 0.787, respectively, with external AUCs ranging from 0.772 to 0.827. Compared with ultrasound (US) and CT, ThyLNT showed superior predictive performance (all P < 0.001 in the validation cohort). Simulation analysis in cN0 patients suggested that ThyLNT could reduce unnecessary lymph node dissection (LND) from 52.16% to 4.88%. Transcriptomic analysis combined with WGCNA and correlation analysis identified VEGFA as the gene most strongly associated with ThyLNT prediction scores. Single-cell and spatial transcriptomic analyses suggested metastasis-related tumor microenvironment remodeling, while enrichment analysis of genes affected by virtual knockout of VEGFA indicated involvement of angiogenesis- and epithelial-mesenchymal transition (EMT)-related pathways. Spatial metabolomics further revealed coordinated lipid metabolic reprogramming in metastatic tissues. These findings suggest that ThyLNT provides robust predictive performance while capturing biologically relevant features associated with metastatic progression.

16
Magnetic field-induced ER stress reprograms the tumor microenvironment to improve triple-negative breast cancer survival

Sharma, V.; Khantwal, C.; Konwar, K.

2026-03-25 cancer biology 10.64898/2026.03.22.713285 medRxiv
Top 0.3%
0.5%
Show abstract

BackgroundNon-invasive electromagnetic field (EMF)-based therapies offer a potential route to modulate local tumor-immune interactions but their mechanistic basis remains poorly defined. MethodsWe evaluated Asha therapy, a proprietary low-intensity (50khz, 2 mT, 25% duty cycle) alternating magnetic-field treatment in preclinical breast cancer models. Cellular responses in human triple negative breast cancer cell lines (MDA-MB-231 and MDA-MB-468) were evaluated using bulk RNA sequencing, quantitative proteomics, flow cytometry, and cytokine analysis and proteomics analysis. Tumor microenvironment responses in mouse 4T1 breast cancer model was characterized using single-cell CITE-seq analysis. Functional efficacy was assessed in vivo using the murine 4T1 triple-negative breast cancer model, both as monotherapy and in combination with anti-PD1 checkpoint blockade. Clinical relevance was assessed by deriving a 19-gene neutrophil activation signature from Asha-induced transcriptional changes and projecting it onto two independent TNBC patient cohorts (METABRIC n=338, SCAN-B n=874) for survival analysis. ResultsAsha therapy induced endoplasmic reticulum (ER) stress and activated an adaptive unfolded-protein response in tumor cells, triggering robust NF-{kappa}B and interferon signaling and time-dependent secretion of inflammatory cytokines. In vivo, these tumor-intrinsic changes propagated to the tumor microenvironment (TME), reprogramming fibroblasts from contractile states to immune-recruiting, interferon-responsive phenotypes and enriching for interferon-stimulated, metabolically active neutrophils and macrophages. These coordinated innate immune changes occurred without overt cytotoxicity and were associated with significant reductions in metastasis and improved survival. Combination with anti-PD1 therapy markedly enhanced efficacy, reducing lung metastasis and mortality by 88% compared with control. The neutrophil activation signature derived from Asha-treated tumors was associated with improved overall survival in both METABRIC (log-rank p=0.036) and SCAN-B (p=0.048) TNBC cohorts by Kaplan-Meier analysis, with pooled multivariable Cox regression confirming significant survival benefit (HR=0.75, 95% CI 0.59-0.94, p=0.01). ConclusionsAsha therapy triggers a controlled ER stress response in tumor cells that drives interferon-mediated cytokine release and immune reprogramming of the TME, resulting in anti-metastatic and survival benefits. These findings identify electromagnetic-field exposure as a potential non-pharmacologic strategy to activate innate immunity and sensitize tumors to checkpoint blockade, supporting further clinical development of EMF-based immunotherapy.

17
GenBio-PathFM: A State-of-the-Art Foundation Model for Histopathology

Kapse, S.; Aygün, M.; Cole, E.; Lundberg, E.; Song, L.; Xing, E. P.

2026-03-20 bioinformatics 10.64898/2026.03.17.712534 medRxiv
Top 0.3%
0.5%
Show abstract

Recent advancements in histopathology foundation models (FMs) have largely been driven by scaling the training data, often utilizing massive proprietary datasets. However, the long-tailed distribution of morphological features in whole-slide images (WSIs) makes simple scaling inefficient, as common morphologies dominate the learning signal. We introduce GenBio-PathFM, a 1.1B-parameter FM that achieves state-of-the-art performance on public benchmarks while using a fraction of the training data required by current leading models. The efficiency of GenBio-PathFM is underpinned by two primary innovations: an automated data curation pipeline that prioritizes morphological diversity and a novel dual-stage learning strategy which we term JEDI (JEPA + DINO). Across the THUNDER, HEST, and PathoROB benchmarks, GenBio-PathFM demonstrates state-of-the-art accuracy and robustness. GenBio-PathFM is the strongest open-weight model to date and the only state-of-the-art model trained exclusively on public data.

18
Translating Histopathology Foundation Model Embeddings into Cellular and Molecular Features for Clinical Studies

Cui, S.; Sui, Z.; Li, Z.; Matkowskyj, K. A.; Yu, M.; Grady, W. M.; Sun, W.

2026-03-19 bioinformatics 10.64898/2026.03.17.711896 medRxiv
Top 0.3%
0.5%
Show abstract

AI-powered pathology foundation models provide general-purpose representations of histopathological images by encoding image tiles into numerical embeddings. However, these embeddings are not directly interpretable in biological or clinical terms and must be translated into biologically meaningful features, such as cell-type composition or gene expression, to enable downstream clinical applications. To bridge this gap, we developed STpath, a framework that integrates histopathology image embeddings derived from existing pathology foundation models with matched, spatially resolved transcriptomics data. STpath consists of cancer-specific XGBoost models trained to infer cell-type compositions and gene expression from histopathology image tiles. We evaluated STpath in colorectal and breast cancer datasets and showed that it provides accurate estimates of the composition of major cell types and the expression of a subset of genes, with further performance gains achieved by combining embeddings from multiple foundation models. Finally, we demonstrated that STpath inferred features that can be used in downstream studies to evaluate their associations with clinical outcomes.

19
Pregnancy Desire and Pregnancy Attempt: Why Words Matter in Reproductive Research -- A Nationwide cross-sectional Cohort Study

KABIRIAN, R.; Bas, R.; Chabassier, A.; Sebbag, C.; Rousset-Jablonski, C.; Bobrie, A.; Coussy, F.; Preau, M.; Espie, M.; Dumas, E.; Reyal, F.; Jacob, G.; Jochum, F.; Hamy Petit, A.-S.

2026-03-19 oncology 10.64898/2026.03.17.26348589 medRxiv
Top 0.3%
0.4%
Show abstract

ObjectiveTo quantify the gap between pregnancy desire and pregnancy attempts among young women with and without a history of breast cancer (BC), and to identify factors associated with this gap. DesignCross-sectional cohort study. SettingThe FEERIC study, conducted in France. PopulationWomen aged 18-43 years without or with prior BC filling inclusion forms of a collaborative study. MethodsPregnancy desire was assessed by self-report ("Do you currently desire a pregnancy?"). Attempt was defined as engaging in unprotected intercourse with the intention to conceive. The pregnancy desire-attempt gap was defined as expressing a desire for pregnancy without actively trying to conceive. Logistic regression was used to evaluate associated demographic, clinical, and treatment-related factors. Main outcome measuresPrevalence of the pregnancy desire-attempt gap and predictors of this gap among BC survivors. ResultsOf 4,351 participants (517 with BC and 3,834 controls), 735 (16.9%) reported a pregnancy desire with 54% attempting conception and 46% who did not. The desire-attempt gap was significantly more frequent in women with a history of BC (OR=1.62, 95%CI[1.15-2.30]). Among BC survivors, younger age (<30years), nulliparity, being single, and ongoing endocrine therapy were independently associated with the gap, whereas prior chemotherapy or trastuzumab were not. ConclusionsNearly half of women declaring desiring pregnancy do not initiate pregnancy attempts, with a larger gap among BC survivors. These findings highlight the need to explore both medical barriers and psychosocial determinants underlying this gap and underscore the importance of refining the language used in reproductive research. FundingThis study was supported by "SHS INCa" grant no.2016-124 and is part of a research project on young women funded by Monoprix*.

20
The Human Male Mammary Gland has Similar Epithelial Populations to Female but Distinct Composition and Transcriptional Properties

Ibanez-Rios, M.-I.; Aalam, S. M. M.; Ritting, M. L.; Jore, A.; Chaludiya, K.; Emperumal, C. P.; Jakub, J. W.; McLaughlin, S. A.; Degnim, A. C.; Couch, F.; Boughey, J. C.; Yadav, S.; Sadanandam, A.; Sherman, M. E.; Radisky, D.; Knapp, D. J. H. F.; Kannan, N.

2026-03-31 cancer biology 10.64898/2026.03.27.714915 medRxiv
Top 0.3%
0.4%
Show abstract

The normal adult male breast has not been characterized at single-cell resolution, leaving the cellular basis of male breast cancer (MBC) biology undefined. Here we present an integrated single-cell RNA sequencing atlas of the adult human breast comprising 174,471 cells from 17 donors (3 male, 14 female), including 18,117 male-derived cells. This revealed that the male breast retains all three epithelial populations, basal (BC), luminal progenitor (LP), and luminal committed cells (LC), but with an increase in LC at the expense of BC and LP across all three male donors. Male LC were distinguished from female by elevated ESR1 and PGR mRNA, enrichment of RNA processing and ribosome biogenesis programs, reduced inflammatory cytokine and growth factor signaling, elevated estradiol gene set enrichment scores, and higher inferred activity of developmental patterning transcription factors. This pattern was observed across differential expression, gene ontology, ligand profiling, and regulon-based analyses, and was not restricted to sex chromosome-linked gene expression. This is consistent with the near-universal estrogen receptor (ER) positivity that characterizes MBC clinically. This atlas provides the first cellular and transcriptional reference for the normal male breast and a resource for investigating sex differences in mammary biology, germline susceptibility variant interpretation, and modeling breast malignancies.